Goto

Collaborating Authors

 independence relation


Supplementary Material: Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias

Neural Information Processing Systems

In this section we provide a detailed proof for the correctness and completeness of the ICD algorithm. For easier referencing we describe ICD in Algorithm 1, and describe the ICD-Sep conditions. A set Zis a subset of ICD-Sep(A,B) given r {0,...,|O| 2}, if and only if 1. |Z|= r, 2. Z Z, there exists a PDS-path ฮ B(A,Z) such that, (a) |ฮ B(A,Z)| r and (b) every node on ฮ B(A,Z) is in Z, and 3. Z Z, node Z is a possible ancestor of Aor B (not a necessary condition). Denote A,B a pair of nodes from O that are connected in G and disconnected in D, and such that Ais not an ancestor of B in D. If A B |[Z0] S, where Z0 O is a minimal separating set having size n+ 1, then there exists a subset Z O having the same size of n+ 1 such that that A B |Z S, and for every node Z Zthere exists a PDS-path ฮ B(A,Z) in G, such that every node V on the PDS-path is also in Z. Proof. It was previously shown that a minimal separating set for Aand B, where Ais not an ancestor of B, is a subset of D-Sep(A,B) (Spirtes et al., 2000, page 134 and Theorem 6.2; Spirtes et al., 1999).



An Algorithm to Learn Polytree Networks with Hidden Nodes

Neural Information Processing Systems

Ancestral graphs are a prevalent mathematical tool to take into account latent (hidden) variables in a probabilistic graphical model. In ancestral graph representations, the nodes are only the observed (manifest) variables and the notion of m-separation fully characterizes the conditional independence relations among such variables, bypassing the need to explicitly consider latent variables. However, ancestral graph models do not necessarily represent the actual causal structure of the model, and do not contain information about, for example, the precise number and location of the hidden variables. Being able to detect the presence of latent variables while also inferring their precise location within the actual causal structure model is a more challenging task that provides more information about the actual causal relationships among all the model variables, including the latent ones. In this article, we develop an algorithm to exactly recover graphical models of random variables with underlying polytree structures when the latent nodes satisfy specific degree conditions. Therefore, this article proposes an approach for the full identification of hidden variables in a polytree. We also show that the algorithm is complete in the sense that when such degree conditions are not met, there exists another polytree with fewer number of latent nodes satisfying the degree conditions and entailing the same independence relations among the observed variables, making it indistinguishable from the actual polytree.




An Algorithm to Learn Polytree Networks with Hidden Nodes

Neural Information Processing Systems

Ancestral graphs are a prevalent mathematical tool to take into account latent (hidden) variables in a probabilistic graphical model. In ancestral graph representations, the nodes are only the observed (manifest) variables and the notion of m-separation fully characterizes the conditional independence relations among such variables, bypassing the need to explicitly consider latent variables. However, ancestral graph models do not necessarily represent the actual causal structure of the model, and do not contain information about, for example, the precise number and location of the hidden variables. Being able to detect the presence of latent variables while also inferring their precise location within the actual causal structure model is a more challenging task that provides more information about the actual causal relationships among all the model variables, including the latent ones. In this article, we develop an algorithm to exactly recover graphical models of random variables with underlying polytree structures when the latent nodes satisfy specific degree conditions. Therefore, this article proposes an approach for the full identification of hidden variables in a polytree.


Markov Conditions and Factorization in Logical Credal Networks

arXiv.org Artificial Intelligence

We examine the recently proposed language of Logical Credal Networks, in particular investigating the consequences of various Markov conditions. We introduce the notion of structure for a Logical Credal Network and show that a structure without directed cycles leads to a well-known factorization result. For networks with directed cycles, we analyze the differences between Markov conditions, factorization results, and specification requirements.


Neural Bayesian Network Understudy

arXiv.org Artificial Intelligence

Bayesian Networks may be appealing for clinical decision-making due to their inclusion of causal knowledge, but their practical adoption remains limited as a result of their inability to deal with unstructured data. While neural networks do not have this limitation, they are not interpretable and are inherently unable to deal with causal structure in the input space. Our goal is to build neural networks that combine the advantages of both approaches. Motivated by the perspective to inject causal knowledge while training such neural networks, this work presents initial steps in that direction. We demonstrate how a neural network can be trained to output conditional probabilities, providing approximately the same functionality as a Bayesian Network. Additionally, we propose two training strategies that allow encoding the independence relations inferred from a given causal structure into the neural network.


Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias

arXiv.org Artificial Intelligence

We present a sound and complete algorithm, called iterative causal discovery (ICD), for recovering causal graphs in the presence of latent confounders and selection bias. ICD relies on the causal Markov and faithfulness assumptions and recovers the equivalence class of the underlying causal graph. It starts with a complete graph, and consists of a single iterative stage that gradually refines this graph by identifying conditional independence (CI) between connected nodes. Independence and causal relations entailed after any iteration are correct, rendering ICD anytime. Essentially, we tie the size of the CI conditioning set to its distance on the graph from the tested nodes, and increase this value in the successive iteration. Thus, each iteration refines a graph that was recovered by previous iterations having smaller conditioning sets -- a higher statistical power -- which contributes to stability. We demonstrate empirically that ICD requires significantly fewer CI tests and learns more accurate causal graphs compared to FCI, FCI+, and RFCI algorithms.


Improving Efficiency and Accuracy of Causal Discovery Using a Hierarchical Wrapper

arXiv.org Artificial Intelligence

Causal discovery from observational data is an important tool in many branches of science. Under certain assumptions it allows scientists to explain phenomena, predict, and make decisions. In the large sample limit, sound and complete causal discovery algorithms have been previously introduced, where a directed acyclic graph (DAG), or its equivalence class, representing causal relations is searched. However, in real-world cases, only finite training data is available, which limits the power of statistical tests used by these algorithms, leading to errors in the inferred causal model. This is commonly addressed by devising a strategy for using as few as possible statistical tests. In this paper, we introduce such a strategy in the form of a recursive wrapper for existing constraint-based causal discovery algorithms, which preserves soundness and completeness. It recursively clusters the observed variables using the normalized min-cut criterion from the outset, and uses a baseline causal discovery algorithm during backtracking for learning local sub-graphs. It then combines them and ensures completeness. By an ablation study, using synthetic data, and by common real-world benchmarks, we demonstrate that our approach requires significantly fewer statistical tests, learns more accurate graphs, and requires shorter run-times than the baseline algorithm.